Breast Cancer Detection

Spoiler Alert ! One of the networks achieve a 98 % classification accuracy.

A Binary Unbalance Classification Example

Breast Cancer is the commonest malignancy among women globally. It has now surpassed lung cancer as the leading cause of global cancer incidence in 2020, with an estimated 2.3 million new cases, representing 11.7% of all cancer cases. In India, the incidence has increased significantly. As per Globocan data 2020, in India, breast cancer accounted for 13.5% of all cancer cases and 10.6% of all deaths.

Glimpse into our Data

X-ray : Benign -&- Malignant

Seeing these images side by side, one cannot make out which is cancerous and which is benign.

Evolution of Cancer

This sort of gives us an indication how cancer cells look different from normal ones.

The dataset used can be access from the Laboratório Visão Robótica e Imagem webiste. The dataset is divided into two main groups: benign & malignant tumors. Malignant tumour is a synonym for cancer: lesion can invade and destroy adjacent structures (locally invasive) and spread to distant sites (metastasise) to cause death. We use the 100x Magnification images here.

Data Preparation

Train Validation Test Split

We will split the data into 3 parts for training, validating and testing the model’s performance. We use about 1600 images for training and 200 for testing and validating.

Original Data Total Train Validation Test
Benign 640 512 64 64
Malignant 1424 1136 144 144
Total 2064 1648 208 208

Based on the above split, we can use batch_size = 16. Batch size is one of the most important hyper-parameters to tune in deep learning. If the value is low, the network will end up training poorly. If high, then the network does not generalise well. Thus, the steps_per_epoch = 103 for train, 13 for validation and test. (Because 16*103 = 2064 and 13*15 = 208).

The 1st chunk of code deals with setting-up respective directories. Then we transfer images from the Malignant and Benign folders to Train-Validation-Test folders. To randomly select files, we use sample() to create indexes.

Code
# Set Directory
base_dir = "C:/Users/KUNAL/Downloads/#R coding/#Books/covered/
#Book - Manning - Deep Learning with R and Keras/## Article/100x"
benign_dir = file.path(base_dir, "benign")
malignant_dir = file.path(base_dir, "malignant")

# Create respective folders
train_dir = file.path(base_dir, "train")
validation_dir = file.path(base_dir, "validation")
test_dir = file.path(base_dir, "test")
# dir.create(train_dir); dir.create(validation_dir); dir.create(test_dir)

# Creating respective sub-folders
train_benign_dir         <- file.path(train_dir, "benign")
validation_benign_dir    <- file.path(validation_dir, "benign")
test_benign_dir          <- file.path(test_dir, "benign")
train_malignant_dir      <- file.path(train_dir, "malignant")
validation_malignant_dir <- file.path(validation_dir, "malignant")
test_malignant_dir       <- file.path(test_dir, "malignant")
# dir.create(train_benign_dir); dir.create(validation_benign_dir); 
# dir.create(test_benign_dir) ; dir.create(train_malignant_dir); 
# dir.create(validation_malignant_dir); dir.create(test_malignant_dir)

# Reading in the file names
benign_fnames = paste0("benign (", 1:640, ").png")
benign_fnames = sample(benign_fnames)        # randomize pictures
malignant_fnames = paste0("malignant (", 1:1424, ").png")
malignant_fnames = sample(malignant_fnames)

# creating indices for train-validation-test split
benign_index = c(rep(1,512),rep(2,64),rep(3,64))
benign_index = sample(benign_index)       # randomize indexes
malignant_index = c(rep(1,1136),rep(2,144),rep(3,144))
malignant_index = sample(malignant_index)

# separating file names as per indices
train_benign_fnames = benign_fnames[benign_index==1]
validation_benign_fnames = benign_fnames[benign_index==2]
test_benign_fnames = benign_fnames[benign_index==3]
#
train_malignant_fnames = malignant_fnames[malignant_index==1]
validation_malignant_fnames = malignant_fnames[malignant_index==2]
test_malignant_fnames = malignant_fnames[malignant_index==3]

# copying files to respective folders
# file.copy(file.path(<old_location>, <file_name>),file.path(<new_location>))
file.copy(file.path(benign_dir, train_benign_fnames),
          file.path(train_benign_dir))
file.copy(file.path(benign_dir, validation_benign_fnames),
          file.path(validation_benign_dir))
file.copy(file.path(benign_dir, test_benign_fnames),
          file.path(test_benign_dir))
file.copy(file.path(malignant_dir, train_malignant_fnames),
          file.path(train_malignant_dir))
file.copy(file.path(malignant_dir, validation_malignant_fnames),
          file.path(validation_malignant_dir))
file.copy(file.path(malignant_dir, test_malignant_fnames),
          file.path(test_malignant_dir))
# Checking
cat("total training benign images:", 
    length(list.files(train_benign_dir)), "\n")
cat("total training malignant images:", 
    length(list.files(train_malignant_dir)), "\n")
cat("total validation benign images:", 
    length(list.files(validation_benign_dir)), "\n")
cat("total validation malignant images:", 
    length(list.files(validation_malignant_dir)), "\n")
cat("total test benign images:", 
    length(list.files(test_benign_dir)), "\n")
cat("total test malignant images:", 
    length(list.files(test_malignant_dir)), "\n")

The count of images for train-validate-test matches the division we saw above. Clearly, this is Unbalanced Classification Problem. So we must assign weights to the classes :

  • For Benign : 2064/(2*640) = 1.6125
  • For Malignant : 2064/(2*1424) = 0.72471910112

Convolution Neural Network

The next chunk of codes involve setting up and training a basic convolution neural network.

A Callback is a list of control-instructions which the model looks during the training process. It helps interrupts training and make changes accordingly. Some of the callbacks we’ll use :

  • callback_early_stopping() -> This interrupts training once a target metric being monitored has stopped improving for a fixed number of epochs and helps avoid overfitting. This is done if there is no improvement in validation loss/error after ‘patience = 3’ number of epochs.
  • callback_model_checkpoint() -> This allows us to continually save model during training - save version of model with best performance.
  • callback_reduce_lr_on_plateau() -> This allows the learning rate to fall if there is no improvement in the validation loss/error. The learning rate is reduced by a factor of 0.3 after there is no improvement in ‘patience = 3’ number of epochs.
Code
library(keras)

# Setting Hyperparameters
batch_size = 16 
img_height = 150
img_width = 150

# Data Augmentation
datagen = image_data_generator(rescale = 1/255, zoom_range = 0.2, 
              horizontal_flip = T, vertical_flip = T, rotation_range = 90)
test_datagen <- image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(train_dir,     
                    datagen,target_size = c(img_height, img_width),
                    batch_size = batch_size,class_mode = "binary")
val_generator <- flow_images_from_directory(validation_dir,datagen,
                    target_size = c(img_height, img_width),
                    batch_size = batch_size,class_mode = "binary")
test_generator <- flow_images_from_directory(test_dir,datagen,
                    target_size = c(img_height, img_width),
                    batch_size = batch_size,
                    class_mode = "binary",shuffle = F)

# Setting callbacks
early  = callback_early_stopping(monitor = "val_loss",mode = "min",patience = 3)

lr_red = callback_reduce_lr_on_plateau(monitor = "val_loss",
                                       patience = 2,
                                       verbose = 1,
                                       factor = 0.3,
                                       min_lr = 0.000001)

callback_list = list(early,lr_red)
class_weight = list("0"=1.6125,"1"= 0.72471910112)
# For Benign : 2064/(2*640) = 1.6125
# For Malignant : 2064/(2*1424) = 0.72471910112

# Model Structure
model <- keras_model_sequential() %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu",
                input_shape = c(img_height, img_width, 3)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%  
  layer_flatten() %>%
  layer_dropout(rate = 0.5) %>% 
  layer_dense(units = 64, activation = "relu") %>%
  layer_dense(units = 32, activation = "relu") %>%
  layer_dense(units = 1, activation = "sigmoid")

# Compile
model %>% compile(loss="binary_crossentropy",
                  optimizer=optimizer_rmsprop(learning_rate = 0.001), 
                  metrics=c("acc"))

# Train
history <- model %>% fit(train_generator,
                         steps_per_epoch = 103,
                         epochs = 25,
                         validation_data = val_generator,
                         validation_steps = 13,
                         callbacks = callback_list,
                         class_weight = class_weight)

# Test
model %>% evaluate(test_generator, steps = 13)
Table 1 : How different CNN model structures perform
Sl. Model Structure Epochs Training Validation Test
Optimizer : Adam
1 5 Conv layers with (32,32,64,64,128) & 2 dense layers with (128,64) hidden units 12/25 82.58% 85.10% 81.25%
2 5 Conv layers with (32,32,64,64,128) & 2 dense layers with (64,32) hidden units 23/25 87.08% 87.98% 82.69%
3 5 Conv Layers with (32,32,64,128,128) & 2 dense layers with (128,64) hidden units 11/25 87.08% 87.98% 82.69%
Optimizer : RMSprop
4 5 Conv Layers with (32,32,64,64,128) & 1 dense layer with 512 hidden units 14/25 85.92% 87.98% 84.13%
5 5 Conv Layers with (32,32,64,64,128) & 2 dense layers with (64,32) hidden units 11/25 87.56% 88.46% 87.01%**

The 5th Model in above table is so far the best performing model with a Test Accuracy of 87%.

Depth-wise Separable Convolution

The idea is to learn spatial features from each channel of its input, independently, before mixing output channels via a point-wise convolution (1x1 convolution). It requires significantly fewer parameters and involves fewer calculations thus, the training process is faster. It tends to learn better representations using less data, resulting in better-performing models. It is especially helpful when dealing with small datasets.

Code
# Network Structure
model <- keras_model_sequential() %>% 
  layer_separable_conv_2d(filters = 32, kernel_size = 3, activation = "relu",
                          input_shape = c(img_height, img_width, 3)) %>% 
  layer_separable_conv_2d(filters = 64, kernel_size = 3, activation = "relu") %>% 
  layer_max_pooling_2d(pool_size  = 2) %>% 
  layer_separable_conv_2d(filters = 64, kernel_size = 3, activation = "relu") %>% 
  layer_separable_conv_2d(filters = 128, kernel_size = 3, activation = "relu") %>% 
  layer_max_pooling_2d(pool_size  = 2) %>% 
  layer_separable_conv_2d(filters = 64, kernel_size = 3, activation = "relu") %>% 
  layer_separable_conv_2d(filters =128, kernel_size = 3, activation = "relu") %>% 
  layer_global_average_pooling_2d() %>%
  layer_dense(units = 32, activation = "relu") %>%
  layer_dense(units = 1, activation = "sigmoid")

# Compile
model %>% compile(loss="binary_crossentropy",
                  optimizer=optimizer_rmsprop(learning_rate = 0.001), 
                  metrics=c("acc"))

# Train
history <- model %>% fit(train_generator,
                         steps_per_epoch = 103,
                         epochs = 25,
                         validation_data = val_generator,
                         validation_steps = 13,
                         callbacks = callback_list,
                         class_weight = class_weight)

# Test
model %>% evaluate(test_generator, steps = 13)

Structure of Separable CNN

Table 2 : How different Separable CNN model structures perform
Sl. Model Structure epochs Training Validation Test
1 (32,64) (32,64) (64,128) (32) 8/25 69.93% 69.23% 69.23%
2 (32,64) (64,128) (128,256) (32) 4/25 57.77% 30.77% 30.76%
3 (64,128) (64,128) (64,128) (32) 5/25 68.23% 69.23% 69.23%
4 (64,128) (64,128) (128,256) (32) 4/25 34.34% 30.77% 30.76%
5 (32,64) (64,128) (64,128) (64) 10/25 77.91% 82.21% 77.40%
6 (16,32) (32,64) (64,128) (32) 11/25 83.19% 87.02% 79.80%
7 (32,64) (64,128) (64,128) (32)** 20/25 85.44% 87.98% 85.58%

The 7th Model from the above table is the best performing among all the Depth-wise Separable CNNs trained here. It has 2 layers with 32 and 64 hidden units in the first separable layer. 64,128 in second and third separable layers respectively. It ends up with a dense layer with 32 hidden units.

DenseNet201

DenseNet-201 is a convolution neural network that is 201 layers deep. The network was trained on more than a million images from the ImageNet database. This pre-trained network can classify images into 1000 object categories such as keyboard, mouse, pencil, and many animals. As a result the network has learnt rich feature representations for a wide range of images which will be helpful for our Cancer X-ray image classification problem.

Code
# Pre-trained Network = Densenet201
conv_base = application_densenet201(weights = "imagenet",include_top = F,
                                    input_shape = c(img_height,img_width,3))
# Model Structure
model = keras_model_sequential() %>% 
  conv_base() %>% 
  layer_global_average_pooling_2d() %>% 
  layer_dropout(rate = 0.5) %>% 
  layer_batch_normalization() %>% 
  layer_dense(units = 1, activation = "sigmoid")

summary(model)

Model Structure

Code
# Compile
model %>% compile(loss="binary_crossentropy",
                  optimizer=optimizer_adam(learning_rate = 1e-4), 
                  metrics=c("acc"))

# Callbacks
early  = callback_early_stopping(monitor = "val_loss",
                                 mode = "min",
                                 patience = 3)
lr_red = callback_reduce_lr_on_plateau(monitor = "val_loss",
                                       patience = 2,
                                       verbose = 1, 
                                       factor = 0.2,
                                       min_lr = 1e-7)
callback_list = list(early,lr_red)
class_weight = list("0"=1.6125,"1"= 0.72471910112)

# Train
history <- model %>% fit(train_generator,
                         steps_per_epoch = 103,
                         epochs = 25,
                         validation_data = val_generator,
                         validation_steps = 13,
                         callbacks = callback_list,
                         class_weight = class_weight)

# Test
model %>% evaluate(test_generator, steps = 13)

# Save Model
model %>% save_model_hdf5("breast_cancer_detection.h5")

Training Logs for Optimizer Adam

Accuracy Plot : Optimizer Adam

Training Logs for Optimizer RMSprop

Accuracy Plot : Optimizer RMSprop

Table 3 : How different Optimizers perform with the DenseNet201 Pre-trained Network
Sl. Model Structure Training Validation Test
1 DenseNet 201 with Adam Optimizer** 98.67% 99.52% 98.07%
2 DenseNet 201 with RMSprop Optimizer 96.36% 99.52% 96.15%

DenseNet 201 with Adam Optimizer achieved a 98.07 % Test Accuracy !